A Regularized Version of Adaboost for Pattern Classification in Historic Air Photographs
نویسنده
چکیده
In this work, we present a novel classification method for geoinformatics tasks, based on a regularized version of the AdaBoost algorithm implemented in the GIS GRASS. AdaBoost is a machine learning classification technique based on a weighted combination of different realizations of a same base model. AdaBoost calls a given base learning algorithm iteratively in a series of runs: at each run, the algorithm modify a set of weights for the contribute of the training samples. Initially the weights are set equally; at each step of the algorithm, the weights of incorrectly classified examples are increased so that the base learner is forced to focus on the hard examples in the training set. The final classifier is obtained by a weighted majority vote of the base classifiers. The AdaBoost algorithm is one of the most successful classification methods in use. While the algorithm largely preserves its general and practical applicability, theoretical and experimental works show that AdaBoost can overfit when it is applied to noisy data. As presented in (Merler et al, 2003), the procedure to regularize AdaBoost consists in sorting data points by hardness, as emerging from analysis of the evolution of AdaBoost weights, and in progressively eliminating the hardest from the data set. In the noisy case, it appears consistently able to reduce both the bias and net variance components of the model error with respect to the standard algorithm. The regularized AdaBoost algorithm was tested on synthetic noisy data. Finally, it was employed for the identification of craters of exploded bombs from geolocated reconnaissance images. Historical aerial material from World War Two was considered (data from the UXB-Trentino project – Autonomous Province of Trento). The final result is a map of the spatial density of craters that can be used as an indicator of the risk of presence of unexploded aerial bombs.
منابع مشابه
Machine Learning on Historic Air Photographs for Mapping Risk of Unexploded Bombs
We describe an automatic procedure for building risk maps of unexploded ordnances (UXO) based on historic air photographs. The system is based on a cost-sensitive version of AdaBoost regularized by hard point shaving techniques, and integrated by spatial smoothing. The result is a map of the spatial density of craters, an indicator of UXO risk.
متن کاملADABOOST ENSEMBLE ALGORITHMS FOR BREAST CANCER CLASSIFICATION
With an advance in technologies, different tumor features have been collected for Breast Cancer (BC) diagnosis, processing of dealing with large data set suffers some challenges which include high storage capacity and time require for accessing and processing. The objective of this paper is to classify BC based on the extracted tumor features. To extract useful information and diagnose the tumo...
متن کاملAn Improved Flower Pollination Algorithm with AdaBoost Algorithm for Feature Selection in Text Documents Classification
In recent years, production of text documents has seen an exponential growth, which is the reason why their proper classification seems necessary for better access. One of the main problems of classifying text documents is working in high-dimensional feature space. Feature Selection (FS) is one of the ways to reduce the number of text attributes. So, working with a great bulk of the feature spa...
متن کاملEfficient Multiclass Implementations of L1-Regularized Maximum Entropy
This paper discusses the application of L1-regularized maximum entropy modeling or SL1-Max [9] to multiclass categorization problems. A new modification to the SL1-Max fast sequential learning algorithm is proposed to handle conditional distributions. Furthermore, unlike most previous studies, the present research goes beyond a single type of conditional distribution. It describes and compares ...
متن کاملAn Improved Flower Pollination Algorithm with AdaBoost Algorithm for Feature Selection in Text Documents Classification
In recent years, production of text documents has seen an exponential growth, which is the reason why their proper classification seems necessary for better access. One of the main problems of classifying text documents is working in high-dimensional feature space. Feature Selection (FS) is one of the ways to reduce the number of text attributes. So, working with a great bulk of the feature spa...
متن کامل